KiaDev Intelligence

#supervised reinforcement learning01/11/2025

SRL: Teaching 7B Models to Reason Step-by-Step on Hard Math and Code

SRL converts expert trajectories into per-step rewarded actions and lets models produce private reasoning spans before each action, giving dense learning signals that boost 7B open models on hard math and coding tasks

READ →